103 research outputs found

    Sequence determinants in human polyadenylation site selection

    Get PDF
    BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. RESULTS: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. CONCLUSION: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm

    RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

    Get PDF
    Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods

    ISOL@: an Italian SOLAnaceae genomics resource

    Get PDF
    BACKGROUND: Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality.The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations.In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource. RESULTS: ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs).The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge. CONCLUSIONS: ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial computational environment where different aspects and details are maintained as they are relevant for the analysis of the organization, the functionality and the evolution of the Solanaceae family

    MetNetAPI: A flexible method to access and manipulate biological network data from MetNet

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Convenient programmatic access to different biological databases allows automated integration of scientific knowledge. Many databases support a function to download files or data snapshots, or a webservice that offers "live" data. However, the functionality that a database offers cannot be represented in a static data download file, and webservices may consume considerable computational resources from the host server.</p> <p>Results</p> <p>MetNetAPI is a versatile Application Programming Interface (API) to the MetNetDB database. It abstracts, captures and retains operations away from a biological network repository and website. A range of database functions, previously only available online, can be immediately (and independently from the website) applied to a dataset of interest. Data is available in four layers: molecular entities, localized entities (linked to a specific organelle), interactions, and pathways. Navigation between these layers is intuitive (e.g. one can request the molecular entities in a pathway, as well as request in what pathways a specific entity participates). Data retrieval can be customized: Network objects allow the construction of new and integration of existing pathways and interactions, which can be uploaded back to our server. In contrast to webservices, the computational demand on the host server is limited to processing data-related queries only.</p> <p>Conclusions</p> <p>An API provides several advantages to a systems biology software platform. MetNetAPI illustrates an interface with a central repository of data that represents the complex interrelationships of a metabolic and regulatory network. As an alternative to data-dumps and webservices, it allows access to a current and "live" database and exposes analytical functions to application developers. Yet it only requires limited resources on the server-side (thin server/fat client setup). The API is available for Java, Microsoft.NET and R programming environments and offers flexible query and broad data- retrieval methods. Data retrieval can be customized to client needs and the API offers a framework to construct and manipulate user-defined networks. The design principles can be used as a template to build programmable interfaces for other biological databases. The API software and tutorials are available at <url>http://www.metnetonline.org/api</url>.</p

    Human SHBG mRNA Translation Is Modulated by Alternative 5′-Non-Coding Exons 1A and 1B

    Get PDF
    BACKGROUND: The human sex hormone-binding globulin (SHBG) gene comprises at least 6 different transcription units (TU-1, -1A, -1B, -1C, -1D and -1E), and is regulated by no less than 6 different promoters. The best characterized are TU-1 and TU-1A: TU-1 is responsible for producing plasma SHBG, while TU-1A is transcribed and translated in the testis. Transcription of the recently described TU-1B, -1C, and -1D has been demonstrated in human prostate tissue and prostate cancer cell lines, as well as in other human cell lines such as HeLa, HepG2, HeK 293, CW 9019 and imr 32. However, there are no reported data demonstrating their translation. In the present study, we aimed to determine whether TU-1A and TU-1B are indeed translated in the human prostate and whether 5' UTR exons 1A and 1B differently regulate SHBG translation. RESULTS: Cis-regulatory elements that could potentially regulate translation were identified within the 5'UTRs of SHBG TU-1A and TU-1B. Although full-length SHBG TU-1A and TU-1B mRNAs were present in prostate cancer cell lines, the endogenous SHBG protein was not detected by western blot in any of them. LNCaP prostate cancer cells transfected with several SHBG constructs containing exons 2 to 8 but lacking the 5'UTR sequence did show SHBG translation, whereas inclusion of the 5'UTR sequences of either exon 1A or 1B caused a dramatic decrease in SHBG protein levels. The molecular weight of SHBG did not vary between cells transfected with constructs with or without the 5'UTR sequence, thus confirming that the first in-frame ATG of exon 2 is the translation start site of TU-1A and TU-1B. CONCLUSIONS: The use of alternative SHBG first exons 1A and 1B differentially inhibits translation from the ATG situated in exon 2, which codes for methionine 30 of transcripts that begin with the exon 1 sequence

    High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur

    Get PDF
    Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.

    ACC2 Is Expressed at High Levels Human White Adipose and Has an Isoform with a Novel N-Terminus

    Get PDF
    Acetyl-CoA carboxylases ACC1 and ACC2 catalyze the carboxylation of acetyl-CoA to malonyl-CoA, regulating fatty-acid synthesis and oxidation, and are potential targets for treatment of metabolic syndrome. Expression of ACC1 in rodent lipogenic tissues and ACC2 in rodent oxidative tissues, coupled with the predicted localization of ACC2 to the mitochondrial membrane, have suggested separate functional roles for ACC1 in lipogenesis and ACC2 in fatty acid oxidation. We find, however, that human adipose tissue, unlike rodent adipose, expresses more ACC2 mRNA relative to the oxidative tissues muscle and heart. Human adipose, along with human liver, expresses more ACC2 than ACC1. Using RT-PCR, real-time PCR, and immunoprecipitation we report a novel isoform of ACC2 (ACC2.v2) that is expressed at significant levels in human adipose. The protein generated by this isoform has enzymatic activity, is endogenously expressed in adipose, and lacks the N-terminal sequence. Both ACC2 isoforms are capable of de novo lipogenesis, suggesting that ACC2, in addition to ACC1, may play a role in lipogenesis. The results demonstrate a significant difference in ACC expression between human and rodents, which may introduce difficulties for the use of rodent models for development of ACC inhibitors

    VILIP-1 Downregulation in Non-Small Cell Lung Carcinomas: Mechanisms and Prediction of Survival

    Get PDF
    VILIP-1, a member of the neuronal Ca++ sensor protein family, acts as a tumor suppressor gene in an experimental animal model by inhibiting cell proliferation, adhesion and invasiveness of squamous cell carcinoma cells. Western Blot analysis of human tumor cells showed that VILIP-1 expression was undetectable in several types of human tumor cells, including 11 out of 12 non-small cell lung carcinoma (NSCLC) cell lines. The down-regulation of VILIP-1 was due to loss of VILIP-1 mRNA transcripts. Rearrangements, large gene deletions or mutations were not found. Hypermethylation of the VILIP-1 promoter played an important role in gene silencing. In most VILIP-1-silent cells the VILIP-1 promoter was methylated. In vitro methylation of the VILIP-1 promoter reduced its activity in a promoter-reporter assay. Transcriptional activity of endogenous VILIP-1 promoter was recovered by treatment with 5′-aza-2′-deoxycytidine (5′-Aza-dC). Trichostatin A (TSA), a histone deacetylase inhibitor, potently induced VILIP-1 expression, indicating that histone deacetylation is an additional mechanism of VILIP-1 silencing. TSA increased histone H3 and H4 acetylation in the region of the VILIP-1 promoter. Furthermore, statistical analysis of expression and promoter methylation (n = 150 primary NSCLC samples) showed a significant relationship between promoter methylation and protein expression downregulation as well as between survival and decreased or absent VILIP-1 expression in lung cancer tissues (p<0.0001). VILIP-1 expression is silenced by promoter hypermethylation and histone deacetylation in aggressive NSCLC cell lines and primary tumors and its clinical evaluation could have a role as a predictor of short-term survival in lung cancer patients
    corecore